AITopics | cross-lingual word

Collaborating Authors

cross-lingual word

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cross-lingual Language Model Pretraining

Alexis CONNEAU, Guillaume Lample

Neural Information Processing SystemsFeb-13-2026, 22:22:03 GMT

Neural Information Processing Systems http://nips.cc/

cross-lingual language model, language model, machine translation, (11 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.76)

Add feedback

Cross-lingual Language Model Pretraining

Alexis CONNEAU, Guillaume Lample

Neural Information Processing SystemsAug-20-2025, 01:06:46 GMT

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining.

language model, machine translation, translation, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.76)

Add feedback

Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages

Hangya, Viktor, Severini, Silvia, Ralev, Radoslav, Fraser, Alexander, Schütze, Hinrich

arXiv.org Artificial IntelligenceNov-21-2023

Very low-resource languages, having only a few million tokens worth of data, are not well-supported by multilingual NLP approaches due to poor quality cross-lingual word representations. Recent work showed that good cross-lingual performance can be achieved if a source language is related to the low-resource target language. However, not all language pairs are related. In this paper, we propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach, that incorporates intermediate related languages to bridge the gap between the distant source and target. We build MWEs one language at a time by starting from the resource rich source and sequentially adding each language in the chain till we reach the target. We extend a semi-joint bilingual approach to multiple languages in order to eliminate the main weakness of previous works, i.e., independently trained monolingual embeddings, by anchoring the target language around the multilingual space. We evaluate our method on bilingual lexicon induction for 4 language families, involving 4 very low-resource (<5M tokens) and 4 moderately low-resource (<50M) target languages, showing improved performance in both categories. Additionally, our analysis reveals the importance of good quality embeddings for intermediate languages as well as the importance of leveraging anchor points from all languages in the multilingual space.

computational linguistic, low-resource language, target language, (13 more...)

arXiv.org Artificial Intelligence

2311.12489

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(11 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Call for More Rigor in Unsupervised Cross-lingual Learning

Artetxe, Mikel, Ruder, Sebastian, Yogatama, Dani, Labaka, Gorka, Agirre, Eneko

arXiv.org Machine LearningApr-30-2020

In work implicitly includes monolingual and natural language processing, the main promise of cross-lingual signals that constitute a departure multilingual learning is to bridge the digital language from the pure setting. We review existing training divide, to enable access to information and signals as well as other signals that may be technology for the world's 6,900 languages (Ruder of interest for future study (§4). We then discuss et al., 2019). For the purpose of this paper, we methodological issues in UCL (e.g., validation, hyperparameter define "multilingual learning" as learning a common tuning) and propose best evaluation model for two or more languages from raw practices (§5). Finally, we provide a unified outlook text, without any downstream task labels. Common of established research areas (cross-lingual use cases include translation as well as pretraining word embeddings, deep multilingual models and multilingual representations. We will use the term unsupervised machine translation) in UCL (§6), interchangeably with "cross-lingual learning".

computational linguistic, linguistic, proceedings, (16 more...)

arXiv.org Machine Learning

doi: 10.18653/v1/2020.acl-main.658

2004.14958

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
Asia > China > Hong Kong (0.05)
(20 more...)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

A Survey of Cross-lingual Word Embedding Models

Ruder, Sebastian, Vulić, Ivan, Søgaard, Anders

Journal of Artificial Intelligence ResearchAug-12-2019

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent, modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

cross-lingual word, proceedings, representation, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11640

AI Access Foundation

11640

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Overview (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(4 more...)

Add feedback

Bilingual Lexicon Induction through Unsupervised Machine Translation

Artetxe, Mikel, Labaka, Gorka, Agirre, Eneko

arXiv.org Artificial IntelligenceJul-24-2019

A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-lingual embeddings, our proposed method obtains an average improvement of 6 accuracy points over nearest neighbor and 4 points over CSLS retrieval, establishing a new state-of-the-art in the standard MUSE dataset.

artificial intelligence, computational linguistic, natural language, (13 more...)

arXiv.org Artificial Intelligence

1907.10761

Country:

Europe (0.71)
North America > United States (0.68)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

Wada, Takashi, Iwata, Tomoharu

arXiv.org Artificial IntelligenceSep-7-2018

We propose an unsupervised method to obtain cross-lingual embeddings without any parallel data or pre-trained word embeddings. The proposed model, which we call multilingual neural language models, takes sentences of multiple languages as an input. The proposed model contains bidirectional LSTMs that perform as forward and backward language models, and these networks are shared among all the languages. The other parameters, i.e. word embeddings and linear transformation between hidden states and outputs, are specific to each language. The shared LSTMs can capture the common sentence structure among all languages. Accordingly, word embeddings of each language are mapped into a common latent space, making it possible to measure the similarity of words across multiple languages. We evaluate the quality of the cross-lingual word embeddings on a word alignment task. Our experiments demonstrate that our model can obtain cross-lingual embeddings of much higher quality than existing unsupervised models when only a small amount of monolingual data (i.e.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

1809.02306

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Limitations of Unsupervised Bilingual Dictionary Induction

Søgaard, Anders, Ruder, Sebastian, Vulić, Ivan

arXiv.org Machine LearningMay-9-2018

Unsupervised machine translation---i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora---seems impossible, but nevertheless, Lample et al. (2018) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

artificial intelligence, natural language, proceedings, (16 more...)

arXiv.org Machine Learning

1805.0362

Country: Europe > Ireland (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback